TAXI at SemEval-2016 Task 13: a Taxonomy Induction Method based on Lexico-Syntactic Patterns, Substrings and Focused Crawling

نویسندگان

  • Alexander Panchenko
  • Stefano Faralli
  • Eugen Ruppert
  • Steffen Remus
  • Hubert Naets
  • Cédrick Fairon
  • Simone Paolo Ponzetto
  • Christian Biemann
چکیده

We present a system for taxonomy construction that reached the first place in all subtasks of the SemEval 2016 challenge on Taxonomy Extraction Evaluation. Our simple yet effective approach harvests hypernyms with substring inclusion and Hearst-style lexicosyntactic patterns from domain-specific texts obtained via language model based focused crawling. Extracted taxonomies are evaluated on English, Dutch, French and Italian for three domains each (Food, Environment and Science). Evaluations against a gold standard and by human judgment show that our method outperforms more complex and knowledge-rich approaches on most domains and languages. Furthermore, to adapt the method to a new domain or language, only a small amount of manual labour is needed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

LT3: A Multi-modular Approach to Automatic Taxonomy Construction

This paper describes our contribution to the SemEval-2015 task 17 on “Taxonomy Extraction Evaluation”. We propose a hypernym detection system combining three modules: a lexico-syntactic pattern matcher, a morphosyntactic analyzer and a module retrieving hypernym relations from structured lexical resources. Our system ranked first in the competition when considering the gold standard and manual ...

متن کامل

A Metric-based Framework for Automatic Taxonomy Induction

This paper presents a novel metric-based framework for the task of automatic taxonomy induction. The framework incrementally clusters terms based on ontology metric, a score indicating semantic distance; and transforms the task into a multi-criteria optimization based on minimization of taxonomy structures and modeling of term abstractness. It combines the strengths of both lexico-syntactic pat...

متن کامل

USAAR at SemEval-2016 Task 13: Hyponym Endocentricity

This paper describes our submission to the SemEval-2016 Taxonomy Extraction Evaluation (TExEval-2) Task. We examine the endocentric nature of hyponyms and propose a simple rule-based method to identify hypernyms at high precision. For the food domain, we extract lists of terms from the Wikipedia lists of lists by using the name of each list as the endocentric head and treating all terms in the ...

متن کامل

SemEval-2016 Task 13: Taxonomy Extraction Evaluation (TExEval-2)

This paper describes the second edition of the shared task on Taxonomy Extraction Evaluation organised as part of SemEval 2016. This task aims to extract hypernym-hyponym relations between a given list of domain-specific terms and then to construct a domain taxonomy based on them. TExEval-2 introduced a multilingual setting for this task, covering four different languages including English, Dut...

متن کامل

QASSIT at SemEval-2016 Task 13: On the integration of Semantic Vectors in Pretopological Spaces for Lexical Taxonomy Acquisition

This paper presents our participation to the SemEval “Task 13: Taxonomy Extraction Evaluation (TExEval-2)” (Bordea et al., 2016). This year, we propose the combination of recent semantic vectors representation into a methodology for semisupervised and auto-supervised acquisition of lexical taxonomies from raw texts. In our proposal, first similarities between concepts are calculated using seman...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016